Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).